Targeted Gene Metagenomic Data Analysis    ◾    263

If the samples share all the taxonomic groups, the percent Jaccard index will be 100%

similar (β

=100%

jac

); the closer it is to 100%, the more similar the samples are. If the two

samples share no species/taxa, they will be 0% similar (β

= 0

jac

). If the percent Jaccard

index is 50%, the two sample will share half of the taxonomic groups.

7.2.5.2.2  Bray–Curtis Dissimilarity Index

ββ

=

+

×

1

2

100

c

a

b

br

(7.14)

The percent Bray–Curtis dissimilarity is always a number between 0 and 100. If it is 0, then

the two samples share all the same species; if it is 100, that means the two samples do not

share any species.

7.2.5.2.3  Unweighted and Weighted UniFrac Distance Index

The UniFrac is phylogenetic-based beta diversity index that takes into account the evolu-

tionary relatedness of the communities in the two samples. The UniFrac distance index

is defined as the fraction of the observed branch lengths of the phylogenetic tree that is

unique in either sample. If the communities of the two samples are identical, UniFrac

index will be zero. If the two communities are evolutionarily unrelated, the UniFrac index

would be 1.0. The UniFrac index is closer to zero if the communities of the two samples are

more evolutionarily related. The weighted UniFrac uses relative abundances of species in

the samples as a weight on the branch lengths (thus, it emphasizes the dominant species).

While unweighted UniFrac uses only presence or absence (thus, it emphasizes the rare

species).

7.3  DATA ANALYSIS WITH QIIME2

Now it is time to get your hand dirty with some worked examples that cover raw data pre-

processing, read clustering, denoising, taxonomic assignment, phylogenetic tree, and diver-

sity analysis. For this purpose, we will use QIIME2 (Quantitative Insights Into Microbial

Ecology 2) [15], which is the most commonly used free program for analysis of amplicon-

based microbial sequencing data. QIIME2 can be used for any analysis of any targeted gene

sequencing data but the program modules for the analysis of metagenomic data based on

16S rRNA gene are very well established. QIIME2 can be installed in different platforms.

For the detailed installation instructions, visit “https://docs.qiime2.org/2022.2/install/”.

If you have Anaconda installed on you Linux computer, you can install it with “conda

install -c qiime2 qiime2”; however, make sure that all requirements are met. We will use

QIIME2 under the Anaconda environment. After installing QIIME2 under Anaconda,

run “conda activate qiime” on the Linux terminal to activate QIIME2 environment. Once

it has been activated, the terminal prompt will change into something like “(qiime2)$”.

Then, you can run any QIIME command. To display the available QIIME2 commands,

run the following:

(qiime2)$ qiime